Improving the Quality of Linked Data Using Statistical Distributions
نویسندگان
چکیده
Linked Data on the Web is either created from structured data sources (such as relational databases), from semi-structured sources (such as Wikipedia), or from unstructured sources (such as text). In the latter two cases, the generated Linked Data will likely be noisy and incomplete. In this paper, we present two algorithms that exploit statistical distributions of properties and types for enhancing the quality of incomplete and noisy Linked Data sets: SDType adds missing type statements, and SDValidate identifies faulty statements. Neither of the algorithms uses external knowledge, i.e., they operate only on the data itself. We evaluate the algorithms on the DBpedia and NELL knowledge bases, showing that they are both accurate as well as scalable. Both algorithms have been used for building the DBpedia 3.9 release: With SDType, 3.4 million missing type statements have been added, while using SDValidate, 13,000 erroneous RDF statements have been removed from the knowledge base.
منابع مشابه
Designing Tolerance of Assembled Components Using Weibull Distribution
Tolerancing is one of the most important tools for planning, controlling, and improving quality in the industry. Tolerancing conducted by design engineers to meet customers’ needs is a prerequisite for producing high-quality products. Engineers use handbooks to conduct tolerancing. While use of statistical methods for tolerancing is not a new concept, engineers often use known distributions, in...
متن کاملBayesian and Iterative Maximum Likelihood Estimation of the Coefficients in Logistic Regression Analysis with Linked Data
This paper considers logistic regression analysis with linked data. It is shown that, in logistic regression analysis with linked data, a finite mixture of Bernoulli distributions can be used for modeling the response variables. We proposed an iterative maximum likelihood estimator for the regression coefficients that takes the matching probabilities into account. Next, the Bayesian counterpart...
متن کاملStatistical Wavelet-based Image Denoising using Scale Mixture of Normal Distributions with Adaptive Parameter Estimation
Removing noise from images is a challenging problem in digital image processing. This paper presents an image denoising method based on a maximum a posteriori (MAP) density function estimator, which is implemented in the wavelet domain because of its energy compaction property. The performance of the MAP estimator depends on the proposed model for noise-free wavelet coefficients. Thus in the wa...
متن کاملOn a New Bimodal Normal Family
The unimodal distributions are frequently used in the theorical statistical studies. But in applied statistics, there are many situations in which the unimodal distributions can not be fitted to the data. For example, the distribution of the data outside the control zone in quality control or outlier observations in linear models and time series may require to be a bimodal. These situations, oc...
متن کاملQuality of Life and Its Related Factors in Hypertensive Patients
Introduction: Hypertension is one of the most common chronic diseases in the world today. Various studies have shown that people with hypertension have a lower life quality than people with normal blood pressure. Despite various studies on hypertension, little information is available on the life quality in patients with hypertension. Due to the relative ambiguity in this regard, the study has ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Int. J. Semantic Web Inf. Syst.
دوره 10 شماره
صفحات -
تاریخ انتشار 2014